Introduction

Column

Information on my Project

The Impact of Team Payroll on Performance and Playoff Success in Major League Baseball

For my study, I am going to analyze the relationship between team payroll and performance, including playoff success, in Major League Baseball (MLB).

Teams faces the challenge of constructing competitive rosters often with financial constraints. This requires a strategic allocation of payroll to maximize the team’s success. I will look into whether higher payrolls translate to improved regular season performace (measured by winning percentage), and greater postseason success. By examining these trends across teams and years, I will be able to make conclusions about the role of financial investment for a team on winning in the MLB.

My dataset contains each individual teams’ information from the years 1990-2016. I was limited to 2016 instead of the current year due to the dataset not having full salary information past 2016. Included is their regular season total games played, their winning percentage (based on total wins and losses), postseason success (for teams in for that specific year), and their total salary along their summed batting vs. pitching salaries. I have filtered the data in specific areas to determine batting vs. pitching salary comparisons, playoff vs. non-playoff comparisons, and playoff success comparisons. The data I am using was gathered from Sean Lehman’s Baseball Database.

Research Questions

  1. Does team payroll impact win percentage in Major League Baseball?

  2. Is there a difference in winning percentage (team success) between spending more on pitching versus batting?

  3. Are teams with higher payrolls more successful in the playoffs?

Column

Variable Information

yearID = Year

teamID = Team

G = Games played by team that year

W = Games won by team that year

L = Games lost by team that year

win_percentage = Winning Percentage for team that year (W/G)

DivWin = Did team win their respective division that year (Y/N)

WCWin = Did team win get past the wild card game that year (Y/N)

LgWin = Did team win their respective league that year (Y/N)

WSWin = Did team win the World Series that year (Y/N)

Round = Furthest round team advanced to that year (NA if not in playoffs)

total_salary = Team total salary for that year

total_batting_salary = Team total salary spent on batters for that year

total_pitching salary = Team total salary spent on pitchers for that year

batting_salary_proportion = Proportion of team’s total salary spent on batters vs. total salary

pitching_salary_proportion = Proportion of team’s total salary spent on pitchers vs. total salary

Data

Summary Tables

Column

Distribution of Team Payrolls

Average Total Salary by Year

Distribution of Total Salary by Year

Distribution of Team Batting Payrolls

Salary Spent on Batting Annually

Distribution of Team Pitching Payrolls

Salary Spent on Pitching Annually

Column

Analysis

Summary tables show that teams spend between $50 million and $400 million annually to build competitive rosters and ultimately win a World Series. Over time, team payrolls have increased due to inflation, rising player wages, and higher revenues from TV contracts, sponsorships, and merchandise. As a result, larger market teams have more flexibility to sign star players, while smaller market teams face financial constraints.

Additionally, teams tend to spend more on batters than pitchers, driven by practical and strategic reasons. Batters are everyday players who contribute consistently, whereas pitchers, with their need for rest days, have less frequent on-field involvement. As a result, teams allocate a larger portion of their payroll to offense, with batting salaries typically ranging from 0.6 to 0.8 of total payroll, compared to pitching salaries, which generally fall between 0.2 and 0.4. This distribution has remained relatively stable over the years, with the ratio of salary spent on batting hovering around 0.7 and on pitching around 0.3.

Total Salary

Column

Scatterplot of Team Payroll and Winning Percentage

Distribution of Win Pecentage by Payroll Quantiles

Chart of Salaries vs. Winning Percentage by Year

Column

Analysis

From our scatterplot, we see a positive correlation between total salary and winning percentage, which makes sense because teams with higher payrolls can afford better players, leading to better performance and more wins.

The quantile chart further supports this, showing that teams in higher payroll quantiles tend to have higher winning percentages. This indicates that teams with larger payrolls are more successful, as they can invest in higher-quality talent, which directly impacts their chances of winning games and achieving better overall performance throughout the season.

The last chart compares each team’s win percentage to the league mean. The darker points tend to be above the line, indicating teams with higher salaries. These teams tend to have greater success, showing a clear link between higher payrolls and better performance.

Batting vs. Pitching

Column

Scatterplot of Team Batting Salary vs. Win Percentage

Batting Salary Quartiles and Winning Percentage

Quantile Chart for Batting Salary Proportion

Column

Scatterplot of Team Pitching Salary vs. Win Percentage

Pitching Salary Quartiles and Winning Percentage

Quantile Chart for Pitching Salary Proportion

Playoff vs. Non-Playoffs

Column

Playoff Team Info

Salary Distribution of Playoff Teams

Boxplot of Playoff Team Salary

Average Playoff Team Salary by Year

Column

Non-Playoff Team Info

Salary Distribution of Non-Playoff Teams

Boxplot of Non-Playoff Team Salary

Average Non-Playoff Team Salary by Year

Playoff Success

Column

Boxplot of Average Salary Based off Playoff Success

Column

Analysis

The comparison between playoff and non-playoff teams reveals a clear disparity in team salaries. The histograms on the previous slide show that non-playoff teams typically fall within the $50M–$199M salary range, while playoff teams display a broader distribution with more values in the $200M–$400M range. This suggests that teams with higher salaries are more likely to make the playoffs, emphasizing the correlation between payroll and postseason qualification.

The boxplots reinforce this finding, showing that the average team salary is higher for playoff teams (through the mean). Additionally, the upper outliers among non-playoff teams, those spending over $250M are exceptions, whereas such spending is common and non-outlier behavior for playoff teams. This distinction highlights the financial advantage often associated with postseason participation.

Further insights from the line chart illustrate that, year after year, playoff teams consistently maintain higher average salaries compared to non-playoff teams. This reveals a persistent trend linking greater payrolls to increased playoff success rates.

However, when examining playoff performance, the dynamic shifts. Boxplots of World Series winners versus other playoff teams reveal that success in the postseason is less dependent on salary. While higher payrolls might secure a strong regular season record over 162 games, playoff outcomes are determined by short 5-7 game series, where salary plays a diminished role. This demonstrates that while money can buy regular season success, postseason triumphs depend on factors beyond payroll, such as player performance under pressure, team chemistry, and managerial decisions.

Conclusion

Column

Conclusion

In conclusion, this analysis provides valuable insights into the relationship between team payroll and success in Major League Baseball. First, regarding the question of whether team payroll impacts win percentage, the data clearly shows a positive correlation: teams with higher payrolls consistently achieve higher win percentages. This supports the notion that investing in better players contributes to regular-season success.

When examining the difference in success between spending more on pitching versus batting, the results reveal intriguing patterns. Teams that allocated a higher proportion of their payroll to pitching tended to have better win percentages, as seen in the positive correlation between pitching salary proportions and success. Conversely, the scatterplots showed a slightly negative relationship between batting salary proportions and win percentage. While spending more overall on payroll enhances team performance, these findings suggest that prioritizing pitching investments beyond the league average may provide a strategic advantage.

Finally, the relationship between payroll and playoff success is less straightforward. While teams with higher payrolls are more likely to make the playoffs, success in the postseason appears less dependent on salary. Factors like matchups, individual player performance, and variability in short series likely diminish the impact of payroll during the playoffs.

Ultimately, this study highlights the importance of both total spending and strategic allocation. Teams aiming to maximize success should focus not only on building a competitive roster but also on the optimal distribution of their payroll across key areas like pitching and batting.

Column

Limitations

This analysis has several limitations. First, it only covers the years 1990-2016, excluding more recent data and missing changes across MLB history. Additionally, payroll alone does not fully determine success, as some teams achieve high performance with modest budgets, while others underperform despite high spending. Playoff performance also introduces unpredictability, as the short series format diminishes the impact of salary differences. The focus on salary proportions for batting and pitching oversimplifies resource allocation, ignoring other factors like defense or bench contributions. Lastly, while correlations are observed, this analysis does not establish causation, overlooking factors like coaching, player health, and in-game decisions.

Resources

Sean Lahman’s Baseball Database (https://sabr.org/lahman-database/)

Specifically the batting, pitching, salary, teams, and postseason datasets.

---
title: "MLB Salary and Winning"
output: 
  flexdashboard::flex_dashboard:
    theme:
      version: 4
      bootswatch: default
      navbar-bg: "darkblue"
    orientation: columns
    vertical_layout: fill
    source_code: embed
---

```{r setup, include=FALSE}
library(flexdashboard)
library(tidyverse)
library(DT)
library(plotly)
library(pacman)

batting<-read_csv("~/Desktop/MTH 209/Final Project Datasets/Batting.csv")
pitching<-read_csv("~/Desktop/MTH 209/Final Project Datasets/Pitching.csv")
salaries<-read_csv("~/Desktop/MTH 209/Final Project Datasets/Salaries (1).csv")
teams<-read_csv("~/Desktop/MTH 209/Final Project Datasets/Teams.csv")
postseason<-read_csv("~/Desktop/MTH 209/Final Project Datasets/SeriesPost.csv")

# Merge Teams with Postseason
postseason_winners <- postseason %>%
  rename(teamID = teamIDwinner) %>%
  select(yearID, teamID, round)

postseason_losers <- postseason %>%
  rename(teamID = teamIDloser) %>%
  select(yearID, teamID, round)

teams_with_winners <- teams %>%
  left_join(postseason_winners, by = c("yearID", "teamID"))

teams_with_losers <- teams %>%
  left_join(postseason_losers, by = c("yearID", "teamID"))

teams_by_year <- bind_rows(teams_with_winners, teams_with_losers)

# Remove duplicates so that each team is listed only once per year
teams_by_year <- teams_by_year %>%
  distinct(teamID, yearID, .keep_all = TRUE)

#Filtering the Data from 1990 to 2016
teams_by_year <- teams_by_year %>%
  filter(yearID >= 1990 & yearID <= 2016)

#Adding Win Percentage Variable
teams_by_year <- teams_by_year %>%
  mutate(win_percentage = W / G)

# Merging salaries with batting data
batting_salaries <- batting %>%
  left_join(salaries, by = c("playerID", "yearID")) %>%
  filter(yearID >= 1990 & yearID <= 2016)

# Merge salaries with pitching data
pitching_salaries <- pitching %>%
  left_join(salaries, by = c("playerID", "yearID")) %>%
  filter(yearID >= 1990 & yearID <= 2016)

# Sum batting salaries for each team by year
team_batting_salaries <- batting_salaries %>%
  group_by(teamID.x, yearID) %>%
  summarize(total_batting_salary = sum(salary, na.rm = TRUE))

# Sum pitching salaries for each team by year
team_pitching_salaries <- pitching_salaries %>%
  group_by(teamID.x, yearID) %>%
  summarize(total_pitching_salary = sum(salary, na.rm = TRUE))

#Merging the batting and pitching total salaries
team_salaries <- team_batting_salaries %>%
  full_join(team_pitching_salaries, by = c("teamID.x", "yearID"))
team_salaries <- team_salaries %>%
  rename(teamID = teamID.x)
teams_with_salaries <- teams_by_year %>%
  left_join(team_salaries, by = c("teamID", "yearID"))

teams_with_salaries <- teams_with_salaries %>% 
  select(-c("R":"park"))

teams_with_salaries <- teams_with_salaries %>% 
  mutate(total_salary=total_batting_salary+total_pitching_salary)

# Creating Variable for Proportion of Total Salary Spent on Batting vs. Pitching
teams_with_salaries <- teams_with_salaries %>%
  mutate(pitching_salary_proportion = total_pitching_salary/total_salary,
         batting_salary_proportion = total_batting_salary/total_salary)

#My Dataset only has salary info until 2016, so I will be using the years of 1990-2016 in my project analysis.
```

Introduction
===

Column {data-width=650}
---

### Information on my Project
**The Impact of Team Payroll on Performance and Playoff Success in Major League Baseball** 

For my study, I am going to analyze the relationship between team payroll and performance, including playoff success, in Major League Baseball (MLB).

Teams faces the challenge of constructing competitive rosters often with financial constraints. This requires a strategic allocation of payroll to maximize the team's success. I will look into whether higher payrolls translate to improved regular season performace (measured by winning percentage), and greater postseason success. By examining these trends across teams and years, I will be able to make conclusions about the role of financial investment for a team on winning in the MLB.

My dataset contains each individual teams' information from the years 1990-2016. I was limited to 2016 instead of the current year due to the dataset not having full salary information past 2016. Included is their regular season total games played, their winning percentage (based on total wins and losses), postseason success (for teams in for that specific year), and their total salary along their summed batting vs. pitching salaries. I have filtered the data in specific areas to determine batting vs. pitching salary comparisons, playoff vs. non-playoff comparisons, and playoff success comparisons. The data I am using was gathered from Sean Lehman’s Baseball Database. 

### Research Questions
1. Does team payroll impact win percentage in Major League Baseball?

2. Is there a difference in winning percentage (team success) between spending
more on pitching versus batting?

3. Are teams with higher payrolls more successful in the playoffs?

Column {data-width=350}
---

### Variable Information

yearID = Year

teamID = Team

G = Games played by team that year

W = Games won by team that year

L = Games lost by team that year

win_percentage = Winning Percentage for team that year (W/G)

DivWin = Did team win their respective division that year (Y/N)

WCWin = Did team win get past the wild card game that year (Y/N)

LgWin = Did team win their respective league that year (Y/N)

WSWin = Did team win the World Series that year (Y/N)

Round = Furthest round team advanced to that year (NA if not in playoffs)

total_salary = Team total salary for that year

total_batting_salary = Team total salary spent on batters for that year

total_pitching salary = Team total salary spent on pitchers for that year

batting_salary_proportion = Proportion of team's total salary spent on batters vs. total salary

pitching_salary_proportion = Proportion of team's total salary spent on pitchers vs. total salary

Data
===

```{r}
DT::datatable(teams_with_salaries, rownames = FALSE, options = list(
                columnDefs = list(list(className = 'dt-center', 
                                       targets = 1:5)), pageLength = 10))
```

Summary Tables
===

Column {.tabset data-width=700} 
---

### Distribution of Team Payrolls
```{r}
ggplot(teams_with_salaries, aes(x = total_salary)) +
  geom_histogram(binwidth = 50000000, fill = "lightblue", color = "black") +
  labs(
    title = "Distribution of Team Payrolls",
    x = "Payroll (in millions)",
    y = "Frequency")
```

### Average Total Salary by Year
```{r}
average_salary_by_year <- teams_with_salaries %>%
  group_by(yearID) %>%
  summarize(average_salary = mean(total_salary, na.rm = TRUE)) %>%
  arrange(yearID) 
ggplot(average_salary_by_year, aes(x = yearID, y = average_salary)) +
  geom_line(color = "blue") +
  labs(
    title = "Average Payroll by Year",
    x = "Year",
    y = "Average Salary")
```

### Distribution of Total Salary by Year
```{r}
boxplot(total_salary ~ yearID, teams_with_salaries,
        main = "Distribution of Total Salaries by Year",
        xlab = "Year",
        ylab = "Total Salary (USD)",
        las = 2,
        col = "lightblue",
        border = "darkblue",
        cex.axis = 0.8, 
        cex.lab = 1)
```

### Distribution of Team Batting Payrolls 
```{r}
ggplot(teams_with_salaries, aes(x = batting_salary_proportion)) +
  geom_histogram(binwidth = 0.01, fill = "blue", color = "black") +
  labs(title = "Distribution of Team Batting Payroll Proportion",
       x = "Batting Payroll Proportion",
       y = "Frequency")
```

### Salary Spent on Batting Annually
```{r}
average_salaries <- teams_with_salaries %>%
  group_by(yearID) %>%
  summarize(
    avg_batting_salary = mean(batting_salary_proportion, na.rm = TRUE),
    avg_pitching_salary = mean(pitching_salary_proportion, na.rm = TRUE))
plot(average_salaries$yearID, average_salaries$avg_batting_salary, 
     type = "l", col = "blue", 
     xlab = "Year", ylab = "Average Salary Proportion", 
     main = "Average Salary Spent on Batting By Year")
```

### Distribution of Team Pitching Payrolls
```{r}
ggplot(teams_with_salaries, aes(x = pitching_salary_proportion)) +
  geom_histogram(binwidth = 0.01, fill = "green", color = "black") +
  labs(title = "Distribution of Team Pitching Payroll Proportion",
       x = "Pitching Payroll Proportion",
       y = "Frequency")
```

### Salary Spent on Pitching Annually
```{r}
plot(average_salaries$yearID, average_salaries$avg_pitching_salary, 
     type = "l", col = "blue", 
     xlab = "Year", ylab = "Average Salary Proportion", 
     main = "Average Salary Spent on Pitching By Year")
```


Column {data-width=300} 
---

### Analysis
Summary tables show that teams spend between $50 million and $400 million annually to build competitive rosters and ultimately win a World Series. Over time, team payrolls have increased due to inflation, rising player wages, and higher revenues from TV contracts, sponsorships, and merchandise. As a result, larger market teams have more flexibility to sign star players, while smaller market teams face financial constraints.

Additionally, teams tend to spend more on batters than pitchers, driven by practical and strategic reasons. Batters are everyday players who contribute consistently, whereas pitchers, with their need for rest days, have less frequent on-field involvement. As a result, teams allocate a larger portion of their payroll to offense, with batting salaries typically ranging from 0.6 to 0.8 of total payroll, compared to pitching salaries, which generally fall between 0.2 and 0.4. This distribution has remained relatively stable over the years, with the ratio of salary spent on batting hovering around 0.7 and on pitching around 0.3.


Total Salary
===

Column {.tabset data-width=700}
---

### Scatterplot of Team Payroll and Winning Percentage
```{r}
ggplot(teams_with_salaries, aes(x = total_salary, y = win_percentage)) +
  geom_point(color="blue") +
  geom_smooth(method="lm",color="red")+
  labs(title = "Team Payroll vs. Win Percentage",
    x = "Total Team Payroll (in millions)",
    y = "Win Percentage")
```

### Distribution of Win Pecentage by Payroll Quantiles
```{r}
boxplot(win_percentage ~ ntile(total_salary, 5), data = teams_with_salaries,
        main = "Win Percentage by Payroll Quintile",
        xlab = "Payroll Quintile", 
        ylab = "Win Percentage", 
        col = c("red", "orange", "yellow", "green", "blue"))
```

### Chart of Salaries vs. Winning Percentage by Year
```{r}
ggplot(teams_with_salaries, aes(x = yearID, y = win_percentage, color = total_salary)) +
  geom_point() +
  geom_smooth(method = "lm", se = FALSE, color = "darkblue") +
  scale_color_gradient(low = "lightblue", high = "darkblue") +
  labs(
    title = "Trend of Salaries vs. Wins Over Time",
    x = "Year",
    y = "Win Percentage")
```

Column {data-width=300} 
---

### Analysis
From our scatterplot, we see a positive correlation between total salary and winning percentage, which makes sense because teams with higher payrolls can afford better players, leading to better performance and more wins.

The quantile chart further supports this, showing that teams in higher payroll quantiles tend to have higher winning percentages. This indicates that teams with larger payrolls are more successful, as they can invest in higher-quality talent, which directly impacts their chances of winning games and achieving better overall performance throughout the season.

The last chart compares each team's win percentage to the league mean. The darker points tend to be above the line, indicating teams with higher salaries. These teams tend to have greater success, showing a clear link between higher payrolls and better performance.

Batting vs. Pitching 
===

Column {.tabset data-width=500}
---

### Scatterplot of Team Batting Salary vs. Win Percentage
```{r}
ggplot(teams_with_salaries, aes(x = batting_salary_proportion, y = win_percentage)) +
  geom_point(color = "blue") +
  geom_smooth(method = "lm", color = "red") +
  labs(title = "Proportion of Payroll Spent on Batting vs. Win Percentage",
    x = "Batting Proportion of Payroll",
    y = "Win Percentage")
```

### Batting Salary Quartiles and Winning Percentage
```{r}
teams_with_salaries <- teams_with_salaries %>%
  mutate(batting_salary_quartiles = ntile(total_batting_salary, 5))

ggplot(teams_with_salaries, aes(x = factor(batting_salary_quartiles), y = win_percentage, fill = factor(batting_salary_quartiles))) +
  geom_boxplot() +
  labs(
    title = "Winning Percentage by Quartiles of Batting Salary",
    x = "Batting Salary Quartile",
    y = "Win Percentage")
```


### Quantile Chart for Batting Salary Proportion
```{r}
teams_with_salaries <- teams_with_salaries %>%
  mutate(batting_salary_quantile = cut(
    batting_salary_proportion, 
    breaks = seq(0.56, 0.91, length.out = 6),
    labels = c("Q1: 0.56-0.63", "Q2: 0.63-0.70", "Q3: 0.70-0.77", "Q4: 0.77-0.84", "Q5: 0.84-0.91")))

quantile_summary <- teams_with_salaries %>%
  group_by(batting_salary_quantile) %>%
  summarize(avg_win_percentage = mean(win_percentage, na.rm = TRUE), .groups = "drop")

ggplot(quantile_summary, aes(x = batting_salary_quantile, y = avg_win_percentage, fill = batting_salary_quantile)) +
  geom_col() +
  labs(title = "Average Win Percentage by Batting Salary Proportion Quantiles",
       x = "Batting Salary Proportion Quantiles",
       y = "Average Win Percentage") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))
```


Column {.tabset data-width=500}
---

### Scatterplot of Team Pitching Salary vs. Win Percentage
```{r}
ggplot(teams_with_salaries, aes(x = pitching_salary_proportion, y = win_percentage)) +
  geom_point(color = "blue") +
  geom_smooth(method = "lm", color = "red") +
  labs(title = "Proportion of Payroll Spent on Pitching vs. Win Percentage",
    x = "Pitching Proportion of Payroll",
    y = "Win Percentage")
```

### Pitching Salary Quartiles and Winning Percentage
```{r}
teams_with_salaries <- teams_with_salaries %>%
  mutate(pitching_salary_quartiles = ntile(total_pitching_salary, 5))

ggplot(teams_with_salaries, aes(x = factor(pitching_salary_quartiles), y = win_percentage, fill = factor(pitching_salary_quartiles))) +
  geom_boxplot() +
  labs(
    title = "Winning Percentage by Quartiles of Pitching Salary",
    x = "Pitching Salary Quartile",
    y = "Win Percentage")
```

### Quantile Chart for Pitching Salary Proportion
```{r}
teams_with_salaries <- teams_with_salaries %>%
  mutate(pitching_salary_quantile = cut(
    pitching_salary_proportion, 
    breaks = seq(0.09, 0.44, length.out = 6),
    labels = c("Q1: 0.09-0.16", "Q2: 0.16-0.23", "Q3: 0.23-0.30", "Q4: 0.30-0.37", "Q5: 0.37-0.44")))

quantile_summary_pitching <- teams_with_salaries %>%
  group_by(pitching_salary_quantile) %>%
  summarize(avg_win_percentage = mean(win_percentage, na.rm = TRUE), .groups = "drop")

ggplot(quantile_summary_pitching, aes(x = pitching_salary_quantile, y = avg_win_percentage, fill = pitching_salary_quantile)) +
  geom_col() +
  labs(title = "Average Win Percentage by Pitching Salary Proportion Quantiles",
       x = "Pitching Salary Proportion Quantiles",
       y = "Average Win Percentage") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))
```


Playoff vs. Non-Playoffs
===

Column {.tabset data-width=500}
---
Playoff Team Info
```{r setup2, include=FALSE}
playoff_teams <- teams_with_salaries %>%
  filter(!is.na(round))
non_playoff_teams <- teams_with_salaries %>%
  filter(is.na(round))
```

### Salary Distribution of Playoff Teams
```{r}
ggplot(playoff_teams, aes(x = total_salary)) +
  geom_histogram(binwidth = 100000000, fill = "blue", color = "black") +
  labs(
    title = "Salary Distribution of Playoff Teams",
    x = "Total Salary",
    y = "Frequency")
```

### Boxplot of Playoff Team Salary
```{r}
ggplot(playoff_teams, aes(x = "", y = total_salary)) +
  geom_boxplot(fill = "blue") +
  labs(
    title = "Boxplot of Salaries: Playoff Teams",
    x = "",
    y = "Total Salary")
```

### Average Playoff Team Salary by Year
```{r}
playoff_avg_salary <- playoff_teams %>%
  group_by(yearID) %>%
  summarize(avg_salary = mean(total_salary, na.rm = TRUE))

ggplot(playoff_avg_salary, aes(x = yearID, y = avg_salary)) +
  geom_line(color = "blue") +
  geom_point(color = "blue") +
  labs(
    title = "Average Salary Over Time: Playoff Teams",
    x = "Year",
    y = "Average Salary")
```


Column {.tabset data-width=500}
---
Non-Playoff Team Info 

### Salary Distribution of Non-Playoff Teams
```{r}
# Plot for Non-Playoff Teams
ggplot(non_playoff_teams, aes(x = total_salary)) +
  geom_histogram(binwidth = 100000000, fill = "red", alpha = 0.7, color = "black") +
  labs(
    title = "Salary Distribution of Non-Playoff Teams",
    x = "Total Salary",
    y = "Frequency")
```

### Boxplot of Non-Playoff Team Salary
```{r}
ggplot(non_playoff_teams, aes(x = "", y = total_salary)) +
  geom_boxplot(fill = "red", alpha = 0.7) +
  labs(
    title = "Boxplot of Salaries: Non-Playoff Teams",
    x = "",
    y = "Total Salary")
```

### Average Non-Playoff Team Salary by Year
```{r}
non_playoff_avg_salary <- non_playoff_teams %>%
  group_by(yearID) %>%
  summarize(avg_salary = mean(total_salary, na.rm = TRUE))

ggplot(non_playoff_avg_salary, aes(x = yearID, y = avg_salary)) +
  geom_line(color = "red") +
  geom_point(color = "red") +
  labs(
    title = "Average Salary Over Time: Non-Playoff Teams",
    x = "Year",
    y = "Average Salary")
```


Playoff Success
===
```{r setup3, include=FALSE}
world_series_winners <- playoff_teams %>%
  filter(WSWin == "Y")
league_winners <- playoff_teams %>%
  filter(LgWin == "Y")
non_winners <- playoff_teams %>%
  filter(WSWin != "Y" & LgWin != "Y")
```

Column {data-width=650}
---
### Boxplot of Average Salary Based off Playoff Success 
```{r}
# Combine datasets and add a label for each group
combined_data <- bind_rows(
  world_series_winners %>% mutate(Group = "World Series Winners"),
  league_winners %>% mutate(Group = "League Winners"),
  non_winners %>% mutate(Group = "Non WS or League Winners"))

ggplot(combined_data, aes(x = Group, y = total_salary, fill = Group)) +
  geom_boxplot() +
  labs(
    title = "Comparison of Average Salary Across Playoff Outcomes",
    x = "Outcome",
    y = "Total Salary",
    fill = "Group") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))
```

Column {data-width=350}
---
### Analysis
The comparison between playoff and non-playoff teams reveals a clear disparity in team salaries. The histograms on the previous slide show that non-playoff teams typically fall within the $50M–$199M salary range, while playoff teams display a broader distribution with more values in the $200M–$400M range. This suggests that teams with higher salaries are more likely to make the playoffs, emphasizing the correlation between payroll and postseason qualification.

The boxplots reinforce this finding, showing that the average team salary is higher for playoff teams (through the mean). Additionally, the upper outliers among non-playoff teams, those spending over $250M are exceptions, whereas such spending is common and non-outlier behavior for playoff teams. This distinction highlights the financial advantage often associated with postseason participation.

Further insights from the line chart illustrate that, year after year, playoff teams consistently maintain higher average salaries compared to non-playoff teams. This reveals a persistent trend linking greater payrolls to increased playoff success rates.

However, when examining playoff performance, the dynamic shifts. Boxplots of World Series winners versus other playoff teams reveal that success in the postseason is less dependent on salary. While higher payrolls might secure a strong regular season record over 162 games, playoff outcomes are determined by short 5-7 game series, where salary plays a diminished role. This demonstrates that while money can buy regular season success, postseason triumphs depend on factors beyond payroll, such as player performance under pressure, team chemistry, and managerial decisions.

Conclusion
===
Column {data-width=600}
---
### Conclusion
In conclusion, this analysis provides valuable insights into the relationship between team payroll and success in Major League Baseball. First, regarding the question of whether team payroll impacts win percentage, the data clearly shows a positive correlation: teams with higher payrolls consistently achieve higher win percentages. This supports the notion that investing in better players contributes to regular-season success.

When examining the difference in success between spending more on pitching versus batting, the results reveal intriguing patterns. Teams that allocated a higher proportion of their payroll to pitching tended to have better win percentages, as seen in the positive correlation between pitching salary proportions and success. Conversely, the scatterplots showed a slightly negative relationship between batting salary proportions and win percentage. While spending more overall on payroll enhances team performance, these findings suggest that prioritizing pitching investments beyond the league average may provide a strategic advantage.

Finally, the relationship between payroll and playoff success is less straightforward. While teams with higher payrolls are more likely to make the playoffs, success in the postseason appears less dependent on salary. Factors like matchups, individual player performance, and variability in short series likely diminish the impact of payroll during the playoffs.

Ultimately, this study highlights the importance of both total spending and strategic allocation. Teams aiming to maximize success should focus not only on building a competitive roster but also on the optimal distribution of their payroll across key areas like pitching and batting.

Column {data-width=400}
---
### Limitations
This analysis has several limitations. First, it only covers the years 1990-2016, excluding more recent data and missing changes across MLB history. Additionally, payroll alone does not fully determine success, as some teams achieve high performance with modest budgets, while others underperform despite high spending. Playoff performance also introduces unpredictability, as the short series format diminishes the impact of salary differences. The focus on salary proportions for batting and pitching oversimplifies resource allocation, ignoring other factors like defense or bench contributions. Lastly, while correlations are observed, this analysis does not establish causation, overlooking factors like coaching, player health, and in-game decisions.

### Resources
Sean Lahman's Baseball Database (https://sabr.org/lahman-database/)

Specifically the batting, pitching, salary, teams, and postseason datasets.